Data Explorations

OPENING QUESTIONS: Yesterdays data portal suggestion is HERE (be careful: even though it says large, not all data sets are in fact all that large)

Here's another data portal: The NOAA Global Monitoring Lab

OBJECTIVES: 

I will find my own "large" (~ 20,000 - 25,000 records)

I will use Google Sheets to create a graph of that data

I will use formatting options to force Google Sheets to recognize my data as a specific type (time, date, decimal/floating point etc...)

I will use Google Sheets to create a filter of that data to create a meaningful subset of that data (which will change the character of the graph in some meaningful way)

WORDS FOR TODAY:

  • Big Data - a broad term for datasets so large or complex that traditional data processing applications are inadequate.
  • Moore's Law - a predication made by Gordon Moore in 1965 that computing power will double every 1.5-2 years, it has remained more or less true ever since.
  • Field - A holder of unique data of a unique data type (For example an AGE field contains integer data that stores a person's age. A field named FNAME would store character data that stores a person's first name
  • Record - A "Row" of data related to a specific topic. For example FNAME, LNAME, ADDRESS, CITY, STATE, ZIP would contain data relating to a specific person.
  • Table - A "Table" of data contains a bunch of rows of data. For example a spreadsheet is a table of data.
  • Data Types:int | long | boolean | date | text
  • botNets: Nefarious users sometimes infect dozens, hundreds or even thousands of computers with malware that opens a certain port and..... listens for directions. The majority of the time that malware doesn't 'hear' anything. But very occasionally it receives instructions from the Nefarious actor instructing it (the infected computer) to execute some sort of 'attack'
  • Rouge Access Point: These used to be fairly common but with widespread encryption and strong passwords they are much less so nowawadays. Nonetheless, the AP wants you to be aware of those: Imagine you are sitting in your favorite coffee shop and the network is down. You can your wifi network options and you find one that is open, available and says "FreeNet". You click on that wifi and your laptop is connected to that network. The bad news is whatever is flying across that wifi router is intercepted. uh oh!
  • Filter: A method for selecting a subset of data from a larger data set

A well crafted science/engineering graph MUST have:

1) A meaningful title. Example: Wave Amplitudes in the North Pacific Following the Tonga Volcanic Eruption and NOT "Wave Heights" or "Data" or "Graph of Waves"

2) Accurate physics term for the Y axis with units of measure in parenthesis: Example: Amplitude (meters)

3) Accurate physics term for the X axis with units of measure in parenthesis: Example: Time (sec)

4) A series of points connected by a line of best fit or a curve of best fit

5) Appropriately spaced numeric values on the Y axis. Not too many, not too few. Goldilocks applies.

6) Appropriately spaced numeric values on the x axis. Not too many, not too few. Goldilocks applies.

HERE's such a graph!

Suggestions:

  • Horizontal grid lines can make your graph a whole lot easier to analyze. Using complementary colors so that your grid lines stand out but don't overwhelm your graph can be most helpful.

  • Vertical grid lines can make your graph a whole lot easier to analyze. Using complementary colors so that your grid lines stand out but don't overwhelm your graph can be most helpful.

  • Avoid clutter and other distractions.

  • You can add a 'key' if you think that will make your graph more easily readable. However, don't add a key if the values are obvious since that just clutters up your graph.

  • A line of best fit or a curve of best fit can be VERY helpful. That can be a bit tricky. Once again, using a complementary color for your line/curve of best fit can be very helpful, but make sure it doesn't overwhelm your graph.